智能论文笔记

Using Convolutional Neural Networks to Detect Compression Algorithms

Shubham Bharadwaj

分类：计算机视觉

2021-11-17

机器学习几乎是渗透各种域，从而激增了优异的结果。它还发现了数字取证的出口，其中它正在成为计算效率的主要驱动器。展示ML算法有效性的突出特征是特征提取，其可以在数字取证应用中有乐器。卷积神经网络还用于识别文件的部分。为此，我们观察到文献不包括关于识别用于压缩文件片段的算法的足够的信息。通过这项研究，我们试图解决这个差距，因为压缩算法有利于在使数据变得更加紧凑的情况下相对较高的熵。我们使用了基础数据集，用各种算法压缩每个文件，并根据该算法设计了模型。使用的模型准确地能够识别使用压缩，LZIP和BZIP2压缩的文件。

translated by 谷歌翻译

Biologically Inspired Oscillating Activation Functions Can Bridge the Performance Gap between Biological and Artificial Neurons

Matthew Mithra Noel , Shubham Bharadwaj , Venkataraman Muthiah-Nakarajan , Praneet Dutta , Geraldine Bessie Amali

分类：神经与进化计算

2021-11-07

非线性激活功能赋予神经网络，具有学习复杂的高维功能的能力。激活功能的选择是一个重要的超参数，确定深神经网络的性能。它显着影响梯度流动，训练速度，最终是神经网络的表示力。像Sigmoids这样的饱和活化功能遭受消失的梯度问题，不能用于深神经网络。通用近似定理保证，Sigmoids和Relu的多层网络可以学习任意复杂的连续功能，以任何准确性。尽管多层神经网络来学习任意复杂的激活功能，但传统神经网络中的每个神经元（使用SIGMOIDS和Relu类似的网络）具有单个超平面作为其决策边界，因此进行线性分类。因此，具有S形，Relu，Swish和Mish激活功能的单个神经元不能学习XOR函数。最近的研究已经发现了两层和三个人皮层中的生物神经元，具有摆动激活功能并且能够单独学习XOR功能。生物神经元中振荡激活功能的存在可能部分解释生物和人工神经网络之间的性能差距。本文提出了4个新的振荡激活功能，使单个神经元能够在没有手动功能工程的情况下学习XOR功能。本文探讨了使用振荡激活功能来解决较少神经元并减少培训时间的分类问题的可能性。

translated by 谷歌翻译

Analysis and application of multispectral data for water segmentation using machine learning

Shubham Gupta , Uma D. , Ramachandra Hebbar

分类：计算机视觉

2022-12-16

Monitoring water is a complex task due to its dynamic nature, added pollutants, and land build-up. The availability of high-resolu-tion data by Sentinel-2 multispectral products makes implementing remote sensing applications feasible. However, overutilizing or underutilizing multispectral bands of the product can lead to inferior performance. In this work, we compare the performances of ten out of the thirteen bands available in a Sentinel-2 product for water segmentation using eight machine learning algorithms. We find that the shortwave infrared bands (B11 and B12) are the most superior for segmenting water bodies. B11 achieves an overall accuracy of $71\%$ while B12 achieves $69\%$ across all algorithms on the test site. We also find that the Support Vector Machine (SVM) algorithm is the most favourable for single-band water segmentation. The SVM achieves an overall accuracy of $69\%$ across the tested bands over the given test site. Finally, to demonstrate the effectiveness of choosing the right amount of data, we use only B11 reflectance data to train an artificial neural network, BandNet. Even with a basic architecture, BandNet is proportionate to known architectures for semantic and water segmentation, achieving a $92.47$ mIOU on the test site. BandNet requires only a fraction of the time and resources to train and run inference, making it suitable to be deployed on web applications to run and monitor water bodies in localized regions. Our codebase is available at https://github.com/IamShubhamGupto/BandNet.

translated by 谷歌翻译

Imitation Learning based Auto-Correction of Extrinsic Parameters for A Mixed-Reality Setup

Shubham Sonawani , Yifan Zhou , Heni Ben Amor

分类：机器人

2022-12-16

In this paper, we discuss an imitation learning based method for reducing the calibration error for a mixed reality system consisting of a vision sensor and a projector. Unlike a head mounted display, in this setup, augmented information is available to a human subject via the projection of a scene into the real world. Inherently, the camera and projector need to be calibrated as a stereo setup to project accurate information in 3D space. Previous calibration processes require multiple recording and parameter tuning steps to achieve the desired calibration, which is usually time consuming process. In order to avoid such tedious calibration, we train a CNN model to iteratively correct the extrinsic offset given a QR code and a projected pattern. We discuss the overall system setup, data collection for training, and results of the auto-correction model.

translated by 谷歌翻译

Modularity through Attention: Efficient Training and Transfer of Language-Conditioned Policies for Robot Manipulation

Yifan Zhou , Shubham Sonawani , Mariano Phielipp , Simon Stepputtis , Heni Ben Amor

分类：机器人

2022-12-08

Language-conditioned policies allow robots to interpret and execute human instructions. Learning such policies requires a substantial investment with regards to time and compute resources. Still, the resulting controllers are highly device-specific and cannot easily be transferred to a robot with different morphology, capability, appearance or dynamics. In this paper, we propose a sample-efficient approach for training language-conditioned manipulation policies that allows for rapid transfer across different types of robots. By introducing a novel method, namely Hierarchical Modularity, and adopting supervised attention across multiple sub-modules, we bridge the divide between modular and end-to-end learning and enable the reuse of functional building blocks. In both simulated and real world robot manipulation experiments, we demonstrate that our method outperforms the current state-of-the-art methods and can transfer policies across 4 different robots in a sample-efficient manner. Finally, we show that the functionality of learned sub-modules is maintained beyond the training process and can be used to introspect the robot decision-making process. Code is available at https://github.com/ir-lab/ModAttn.

translated by 谷歌翻译

SparseFusion: Distilling View-conditioned Diffusion for 3D Reconstruction

Zhizhuo Zhou , Shubham Tulsiani

分类：计算机视觉

2022-12-01

We propose SparseFusion, a sparse view 3D reconstruction approach that unifies recent advances in neural rendering and probabilistic image generation. Existing approaches typically build on neural rendering with re-projected features but fail to generate unseen regions or handle uncertainty under large viewpoint changes. Alternate methods treat this as a (probabilistic) 2D synthesis task, and while they can generate plausible 2D images, they do not infer a consistent underlying 3D. However, we find that this trade-off between 3D consistency and probabilistic image generation does not need to exist. In fact, we show that geometric consistency and generative inference can be complementary in a mode-seeking behavior. By distilling a 3D consistent scene representation from a view-conditioned latent diffusion model, we are able to recover a plausible 3D representation whose renderings are both accurate and realistic. We evaluate our approach across 51 categories in the CO3D dataset and show that it outperforms existing methods, in both distortion and perception metrics, for sparse-view novel view synthesis.

translated by 谷歌翻译

Sign Language to Text Conversion in Real Time using Transfer Learning

Shubham Thakar , Samveg Shah , Bhavya Shah , Anant V. Nimkar

分类：计算机视觉 | 机器学习

2022-11-13

The people in the world who are hearing impaired face many obstacles in communication and require an interpreter to comprehend what a person is saying. There has been constant scientific research and the existing models lack the ability to make accurate predictions. So we propose a deep learning model trained on ASL i.e. American Sign Language which will take actions in the form of ASL as input and translate it into text. To achieve the translation a Convolution Neural Network model and a transfer learning model based on the VGG16 architecture are used. There has been an improvement in accuracy from 94% of CNN to 98.7% of Transfer Learning, an improvement of 5%. An application with the deep learning model integrated has also been built.

translated by 谷歌翻译

Going for GOAL: A Resource for Grounded Football Commentaries

Alessandro Suglia , José Lopes , Emanuele Bastianelli , Andrea Vanzo , Shubham Agarwal , Malvina Nikandrou , Lu Yu , Ioannis Konstas , Verena Rieser

分类：计算机视觉 | 自然语言处理

2022-11-08

Recent video+language datasets cover domains where the interaction is highly structured, such as instructional videos, or where the interaction is scripted, such as TV shows. Both of these properties can lead to spurious cues to be exploited by models rather than learning to ground language. In this paper, we present GrOunded footbAlL commentaries (GOAL), a novel dataset of football (or `soccer') highlights videos with transcribed live commentaries in English. As the course of a game is unpredictable, so are commentaries, which makes them a unique resource to investigate dynamic language grounding. We also provide state-of-the-art baselines for the following tasks: frame reordering, moment retrieval, live commentary retrieval and play-by-play live commentary generation. Results show that SOTA models perform reasonably well in most tasks. We discuss the implications of these results and suggest new tasks for which GOAL can be used. Our codebase is available at: https://gitlab.com/grounded-sport-convai/goal-baselines.

translated by 谷歌翻译

An Ensemble-based approach for assigning text to correct Harmonized system code

Shubham , Avinash Arya , Subarna Roy , Sridhar Jonnala

分类：人工智能

2022-11-08

Industries must follow government rules and regulations around the world to classify products when assessing duties and taxes for international shipment. Harmonized System (HS) is the most standardized numerical method of classifying traded products among industry classification systems. A hierarchical ensemble model comprising of Bert- transformer, NER, distance-based approaches, and knowledge-graphs have been developed to address scalability, coverage, ability to capture nuances, automation and auditing requirements when classifying unknown text-descriptions as per HS method.

translated by 谷歌翻译

From fat droplets to floating forests: cross-domain transfer learning using a PatchGAN-based segmentation model

Kameswara Bharadwaj Mantha , Ramanakumar Sankar , Yuping Zheng , Lucy Fortson , Thomas Pengo , Douglas Mashek , Mark Sanders , Trace Christensen , Jeffrey Salisbury , Laura Trouille

分类：机器学习 | 计算机视觉

2022-11-08

Many scientific domains gather sufficient labels to train machine algorithms through human-in-the-loop techniques provided by the Zooniverse.org citizen science platform. As the range of projects, task types and data rates increase, acceleration of model training is of paramount concern to focus volunteer effort where most needed. The application of Transfer Learning (TL) between Zooniverse projects holds promise as a solution. However, understanding the effectiveness of TL approaches that pretrain on large-scale generic image sets vs. images with similar characteristics possibly from similar tasks is an open challenge. We apply a generative segmentation model on two Zooniverse project-based data sets: (1) to identify fat droplets in liver cells (FatChecker; FC) and (2) the identification of kelp beds in satellite images (Floating Forests; FF) through transfer learning from the first project. We compare and contrast its performance with a TL model based on the COCO image set, and subsequently with baseline counterparts. We find that both the FC and COCO TL models perform better than the baseline cases when using >75% of the original training sample size. The COCO-based TL model generally performs better than the FC-based one, likely due to its generalized features. Our investigations provide important insights into usage of TL approaches on multi-domain data hosted across different Zooniverse projects, enabling future projects to accelerate task completion.

translated by 谷歌翻译